A Comparison of Discourse Connective Identification of Coh-Metrix and the Penn Discourse Treebank
نویسنده
چکیده
Coh-Metrix is a linguistic tool used by many researchers to quickly measure cohesion and coherence of text. Because it is a free, easy to use, and quite efficient linguistic tool, it is widely used in academic research and analysis. The results of many of these studies are dependent on the accuracy of the Coh-Metrix tool. I will be testing the accuracy of CohMetrix, focusing on its analysis of discourse connectives. Discourse connectives are the easiest discourse markers to identify computationally and are a major factor for identifying the cohesion of a text. Coh-Metrix uses a ”bag of words” approach to connectives, labeling every lexical connective in all possible senses. I propose that this measurement is too broad, leaving room for misinterpretation. As such, I will test Coh-Metrix with selected texts from the Wall Street Journal against the ”gold standard” of discourse annotation, the Penn Discourse Treebank. Results call to question the accuracy of Coh-Metrix’ connective score.
منابع مشابه
Learning Connective-based Word Representations for Implicit Discourse Relation Identification
We introduce a simple semi-supervised approach to improve implicit discourse relation identification. This approach harnesses large amounts of automatically extracted discourse connectives along with their arguments to construct new distributional word representations. Specifically, we represent words in the space of discourse connectives as a way to directly encode their rhetorical function. E...
متن کاملAttribution And The (Non-)Alignment Of Syntactic And Discourse Arguments Of Connectives
The annotations of the Penn Discourse Treebank (PDTB) include (1) discourse connectives and their arguments, and (2) attribution of each argument of each connective and of the relation it denotes. Because the PDTB covers the same text as the Penn TreeBank WSJ corpus, syntactic and discourse annotation can be compared. This has revealed significant differences between syntactic structure and dis...
متن کاملAnnotation And Data Mining Of The Penn Discourse TreeBank
The Penn Discourse TreeBank (PDTB) is a new resource built on top of the Penn Wall Street Journal corpus, in which discourse connectives are annotated along with their arguments. Its use of standoff annotation allows integration with a stand-off version of the Penn TreeBank (syntactic structure) and PropBank (verbs and their arguments), which adds value for both linguistic discovery and discour...
متن کاملThe Penn Discourse Treebank
This paper describes a new discourse-level annotation project – the Penn Discourse Treebank (PDTB) – that aims to produce a large-scale corpus in which discourse connectives are annotated, along with their arguments, thus exposing a clearly defined level of discourse structure. The PDTB is being built directly on top of the Penn Treebank and Propbank, thus supporting the extraction of useful sy...
متن کاملFrom Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank
The present paper reports on a preparatory research for building a language corpus annotation scenario capturing the discourse relations in Czech. We primarily focus on the description of the syntactically motivated relations in discourse, basing our findings on the theoretical background of the Prague Dependency Treebank 2.0 and the Penn Discourse Treebank 2. Our aim is to revisit the present-...
متن کامل